An introduction of the problem domain and a description of the variable(s) you are choosing to analyze (and why!)
The U.S. prison system is one of the starkest manifestations of racial inequality in the country. A complex set of social and political structures, including over-policing and the war on drugs, has led to the disproportionate incarceration of people of color. This report uses data from the Vera Institute to analyze incarceration trends and uncover patterns of inequality.
For this analysis, we focus on variables such as the total jail population, and specific racial groups including Black, White, Latinx, AAPI (Asian American and Pacific Islander), and Native American populations. These variables are chosen to highlight the racial disparities in incarceration rates across different counties and states over time.
The analysis of the total prison population across US states from 2001 to 2016 reveals several key insights. As of the most recent year in the dataset (2016), the average prison population across all states is approximately 30773. The state with the highest prison population in 2016 is Texas (TX) with a total of 149,478, whereas the state with the lowest prison population is North Dakota (ND) with a total of 1,675. Over the last 10 years, from 2006 to 2016, there has been a cumulative change of -54700 in the total prison population across all states. This data highlights the disparities in prison populations among states and the changes over time.
Who collected the data?
The data was collected by the Vera Institute of Justice, an independent nonprofit national research and policy organization.
How was the data collected or generated?
The data was collected using multiple sources, including the Bureau of Justice Statistics (BJS) and state departments of correction. Specifically, the data incorporates information from the National Corrections Reporting Program (NCRP), the Deaths in Custody Reporting Program (DCRP), the Annual Survey of Jails (ASJ), and the Census of Jails (COJ).
Why was the data collected?
The data was collected to provide a comprehensive understanding of incarceration trends in the United States at the county level. This level of detail is necessary to understand the causes and consequences of incarceration, as county officials are the primary decision-makers regarding who is sent to jail or prison and for how long. The goal is to uncover patterns and disparities in incarceration, particularly along racial and geographic lines, and to inform policy decisions and reform efforts aimed at reducing mass incarceration and its associated inequalities.
How many observations (rows) are in your data?
153811
How many features (columns) are in the data?
38
What, if any, ethical questions or questions of power do you need to consider when working with this data?
Firstly, the potential for reinforcing stereotypes and biases against certain racial or ethnic groups must be managed carefully. The data reflects systemic inequalities, and it is essential to present findings in a way that highlights these disparities without stigmatizing the affected communities. Secondly, privacy concerns must be addressed, particularly when dealing with individual-level data, to ensure that no personally identifiable information is disclosed.
What are possible limitations or problems with this data? (at least 200 words)
One significant limitation is the potential for missing or incomplete data, particularly in earlier years or for certain jurisdictions. Data collection methods and definitions may have changed over time, leading to inconsistencies. Additionally, there may be gaps in data for smaller or less populous counties, which can affect the overall analysis. Another limitation is the reliance on reported data from correctional facilities, which may underreport or misreport certain metrics due to administrative errors or intentional misrepresentation.
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
## # A tibble: 1 × 1
## average_prison_pop
## <dbl>
## 1 30773.
## # A tibble: 1 × 3
## state year total_prison_pop
## <chr> <int> <dbl>
## 1 TX 2016 149478
## # A tibble: 1 × 3
## state year total_prison_pop
## <chr> <int> <dbl>
## 1 ND 2016 1675
## # A tibble: 33 × 2
## state change
## <chr> <dbl>
## 1 AL -1134
## 2 AZ 6362
## 3 CA -45152
## 4 CO -2442
## 5 FL 5886
## 6 GA -327
## 7 IA -167
## 8 IN -253
## 9 KY 2897
## 10 ME 452
## # ℹ 23 more rows
## # A tibble: 1 × 1
## total_change
## <dbl>
## 1 -54700.
The first chart that you will create and include will show the trend over time of your variable/topic. Think carefully about what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design:
When we say “clear” or “human readable” titles and labels, that means that you should not just display the variable name.
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
The second chart that you will create and include will show how two different (continuous) variables are related to one another. Again, think carefully about what such a comparison means and what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design:
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
## | | | 0% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |======== | 11% | |======== | 12% | |========== | 14% | |========== | 15% | |=========== | 16% | |============ | 18% | |============= | 18% | |================= | 24% | |================== | 25% | |===================== | 30% | |======================== | 34% | |========================== | 37% | |============================ | 40% | |================================ | 45% | |=================================== | 51% | |======================================= | 55% | |========================================== | 60% | |============================================ | 62% | |============================================ | 63% | |================================================ | 68% | |================================================= | 70% | |================================================== | 71% | |===================================================== | 76% | |========================================================= | 81% | |============================================================= | 87% | |================================================================ | 92% | |================================================================= | 93% | |===================================================================== | 99% | |======================================================================| 100%
The last chart that you will create and include will show how a variable is distributed geographically. Again, think carefully about what such a comparison means and what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design: